253 research outputs found

    Discriminative latent variable models for visual recognition

    Get PDF
    Visual Recognition is a central problem in computer vision, and it has numerous potential applications in many dierent elds, such as robotics, human computer interaction, and entertainment. In this dissertation, we propose two discriminative latent variable models for handling challenging visual recognition problems. In particular, we use latent variables to capture and model various prior knowledge in the training data. In the rst model, we address the problem of recognizing human actions from still images. We jointly consider both poses and actions in a unied framework, and treat human poses as latent variables. The learning of this model follows the framework of latent SVM. Secondly, we propose another latent variable model to address the problem of automated tag learning on YouTube videos. In particular, we address the semantic variations (sub-tags) of the videos which have the same tag. In the model, each video is assumed to be associated with a sub-tag label, and we treat this sub-tag label as latent information. This model is trained using a latent learning framework based on LogitBoost, which jointly considers both the latent sub-tag label and the tag label. Moreover, we propose a novel discriminative latent learning framework, kernel latent SVM, which combines the benet of latent SVM and kernel methods. The framework of kernel latent SVM is general enough to be applied in many applications of visual recognition. It is also able to handle complex latent variables with interdependent structures using composite kernels

    Pose Embeddings: A Deep Architecture for Learning to Match Human Poses

    Full text link
    We present a method for learning an embedding that places images of humans in similar poses nearby. This embedding can be used as a direct method of comparing images based on human pose, avoiding potential challenges of estimating body joint positions. Pose embedding learning is formulated under a triplet-based distance criterion. A deep architecture is used to allow learning of a representation capable of making distinctions between different poses. Experiments on human pose matching and retrieval from video data demonstrate the potential of the method

    VIDEO THUMBNAIL SELECTION BASED ON DEEP LEARNING

    Get PDF
    Video thumbnails are often the first thing a viewer sees when browsing or searching for videos. A frame that is visually representative of the video is typically selected and used as a thumbnail representation of the video. Sometimes, such a thumbnail is not an adequate semantic representation of the video. Further, it is possible that such a thumbnail is not visually pleasing. This disclosure describes deep learning techniques to select video thumbnails that are visually attractive and reflect the content of a video. Thumbnails as described in this disclosure are attractive, improve a likelihood of user selection, and help users find relevant content easily

    A Dimension-Augmented Physics-Informed Neural Network (DaPINN) with High Level Accuracy and Efficiency

    Full text link
    Physics-informed neural networks (PINNs) have been widely applied in different fields due to their effectiveness in solving partial differential equations (PDEs). However, the accuracy and efficiency of PINNs need to be considerably improved for scientific and commercial use. To address this issue, we systematically propose a novel dimension-augmented physics-informed neural network (DaPINN), which simultaneously and significantly improves the accuracy and efficiency of the PINN. In the DaPINN model, we introduce inductive bias in the neural network to enhance network generalizability by adding a special regularization term to the loss function. Furthermore, we manipulate the network input dimension by inserting additional sample features and incorporating the expanded dimensionality in the loss function. Moreover, we verify the effectiveness of power series augmentation, Fourier series augmentation and replica augmentation, in both forward and backward problems. In most experiments, the error of DaPINN is 1∼\sim2 orders of magnitude lower than that of PINN. The results show that the DaPINN outperforms the original PINN in terms of both accuracy and efficiency with a reduced dependence on the number of sample points. We also discuss the complexity of the DaPINN and its compatibility with other methods.Comment: 33 pages, 12 figure

    Multi-task super resolution method for vector field critical points enhancement

    Get PDF
    It is a challenging task to handle the vector field visualization at local critical points. Generally, topological based methods firstly divide critical regions into different categories, and then process the different types of critical regions to improve the effect, which pipeline is complex. In the paper, a learning based multi-task super resolution (SR) method is proposed to improve the refinement of vector field, and enhance the visualization effect, especially at the critical region. In detail, the multi-task model consists of two important designs on task branches: one task is to simulate the interpolation of discrete vector fields based on an improved super-resolution network; and the other is a classification task to identify the types of critical vector fields. It is an efficient end-to-end architecture for both training and inferencing stages, which simplifies the pipeline of critical vector field visualization and improves the visualization effect. In experiment, we compare our method with both traditional interpolation and pure SR network on both simulation data and real data, and the reported results indicate our method lower the error and improve PSNR significantly
    • …
    corecore